ML Tutorials: ML Lifecycle

ML Tutorials: Machine Learning Lifecycle

You may have heard the phrase, "Your [Machine Learning] model is only as good as its training data". If not, no worries. In this tutorial, we'll explore how to get from raw data to a productionalize Machine Learning Application using the Machine Learning Lifecycle.

Machine Learning Lifecycle (MDLC)

Summary

Similar to the Software Development Lifecycle (SDLC), the Model Development Lifecycle (MDLC), or "Machine Learning Lifecycle, is used as an iterative process by Data Scientists when developing and productionalizing new Machine Learning Models.
The MDLC consists of the following steps:
1. Business Understanding
  - Before any action takes place, you need to understand the problem that is trying to be solved either by the company ou work for, or the outcome you are trying to produce.
  - This steps consists of
    - Gathering Requirements (e.g, tools, budget, team members)
    - Setting Goals and Milestones (e.g., when to check-in with stakeholders, what tasks to accomplish in which time frames)
    - Conducting an ROI analysis (this is used by businesses to understand if the benefits outweigh the costs of producing this model).
2. Model Selection
3. Data Collection
4. Data Preparation (Not Mentioned in the video)
5. Exploratory Data Analysis (Not Mentioned in the video)
6. Training, Tuning, and Evaluation
  - Assuming you're confident with the accuracy and quality of your Training Data, you can move on to fitting your Training Data to your model. This step typically involves the following steps:
    - Determine whether you want to use a Statistical Model or a Deep Learning Model
    - Hyperparameter Tuning
    - Evaluation on Test Dataset
7. Deployment
  - Once you're satisfied with your model's performance, you can move onto serving your model to your end users. This step typically involves the following steps:
    - Identifying a Serving Framework
    - Deployment to Cloud (typically via Docker)
    - Artifacts Management
8. Monitoring
  - The reason why the MDLC is called a "lifecycle" is because it goes in a continuous loop. Monitoring your model's performance as it's used in production is critical to maintaining a consistent infrastructure. This step typically involves the following steps:
    - Live Monitoring
    - Inference Accuracy Measurement
    - Set Notifications and Track Logs
9. Retrain
  - Following the concept of Continuous Integration/Continuous Deployment (CI/CD), you will want to add new features to your model and collect new training and testing data from your users to enhance your model's performance.
    - You should retrain your model based on production monitoring logs and changing business requirements.
    - Collect new data from user inputs and retrain your model with the new data.
    - Continuously perform hyperparameter tuning to ensure your model is performing at its best.
Although the MDLC can seem like a lot to handle, existing frameworks such as MLFlow, AWS SageMaker, and KubeFlow make monitoring your MDLC simple.

Previous Topic

Next Topic